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ABSTRACT 

Suppose  in  a  distribution  problem,  the  sample  information  w  is  split 
into  two  pieces  W^  and  W^,  and  the  parameters  involved  are  split  into  two 
sets,  +  containing  the  parameters  of  interest,  and  8  containing  nuisance 
parameters.  It  is  shown  that,  under  certain  conditions,  the  posterior 
distribution  of  #  does  not  depend  on  the  data  W^,  which  can  thus  be 
ignored.  This  also  has  consequences  for  the  predictive  distribution  of  future 
(or  missing)  observations.  In  fact,  under  similar  conditions,  the  predictive 
distributions  using  W  or  just  W  are  identical. 
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SIGNIFICANCE  AND  EXPLANATION 


In  the  application  of  Bayesian  methods,  some  posterior  and  predictive 
distributions  may  be  unaffected  when  portions  of  the  data  are  ignored. 
Conditions  under  which  this  is  true  are  given,  and  examples  are  provided. 


The  responsibility  for  the  wording  and  views  expressed  in  this  descriptive 
summary  lies  with  MRC,  and  not  the  authors  of  this  report. 


DROPPING  OBSERVATIONS  WITHOUT  AFFECTING  POSTERIOR  AND 


PREDICTIVE  DISTRIBUTIONS 
Nonun  Draper  and  Irwin  Guttman* 


1.  MAIN  RESULT 

Suppose,  that  W  *  1$  a  vector  of  random  variables  whose  density 

function  fw(wJ<M)  depends  on  two  sets  of  parameters  <M.  Suppose  further  that 
we  are  Interested  in  $,  that  6  will  be  regarded  as  a  vector  of  nuisance  parameters 
and  that  the  following  conditions  hold. 

1.  Wj  and  Wg  are  statistically  Independent. 

2.  The  marginal  distribution  fy  (w2|0)  of  W2  depends  only  on  0  and  not  on 

3.  The  marginal  distribution  of  Hj  *  (W^.W^)'  is  such  that 


I?) f, 


U 

Cl  2 


(?i2bir*i) 


(1.1) 


4.  The  prior  information  about  the  parameter  sets  <j>  and  0  is  such  that 

««• 


p(4».e)  *  a(e)b(<p)  (1.2) 

so  that  0,  $  are  Independent  a  priori. 

Theorem  1.  Under  conditions  1-4,  the  marginal  posterior  of  <j>  based  on  Vl^  and 
W2  does  not  depend  on  W2. 

Proof.  The  posterior  of  $  given  (W.,W,)'  »  (w,,w0)'  is 

—  —I  *4  ■»  I  ~C 


p(*|w,,w2) 


(1.3a) 


b($)  /  a(0)fw  (w2|0)fw  (w^^.ejde 
0  ~2  ~1 


(1.3b) 
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*  b^)fw12|w1/-i2^ni!)  x 

/  a(e)fw  (w2Je)fw  (w^jejde.  (1.3c) 
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The  integral  in  (1.3c)  is  clearly  a  function  of  w2  and  w^,  i.e.,  constant  with 
respect  to  $,  and  so  may  be  absorbed  into  the  constant  of  proportionality. 

This  proves  the  result  stated. 


2.  TWO  EXAMPLES 


Example  2.1.  Consider  the  bivariate  normal  distribution  with  vector  of  means 
V  *  (yj'iig)'.  variance-covariance  matrix  Z  *  ((a^)),  and  inverse  *  ((c^)). 


Let 


®  -  v2  2^1  ^^22* 

0  *  -C21/C22, 

dll  =  c22’ 

d22  =  C11  "  c21/c22* 
nl  =  V 


or 


vi  *  V 

y2  *  a  +0n1 » 

c-ji  *  d22  +  6  i »  (2.1) 

c22  ’  dir 
c2i  *  '6dir 


We  remark  in  passing  that, .if  (2.1)  is  considered  as  a  transformation  from 
(U|,u2,c11,c22,c21)  to  (n-| .a.g.d^  ,d22),  then  the  Jacobian  has  absolute  value  d^. 

Using  (2.1),  we  can  re-write  the  usual  form  of  the  bivariate  normal 
frequency  function  in  x-j  and  x2  as 

f (xj  »x2|y.|  ,ct,0,d'ji  ,d22) 

8  f  (x2)x-j  ;a,8,dn  )  x  f(*i  (n^  »d22)  (2.2) 
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f (x2|x1  ;ot,0,d11 )  «  (d^/2ir),,t  expI-Jsd^Xg-a-BXj)*},  (2.2a) 


=  (d22/2ir)1/2  expt-Hd^Uj-^)2}, 


(2.2b) 


Now  suppose  that  our  data  on  x^  and  x2  are  divided  up  into  two  independent  portions, 
satisfying  condition!  of  Theorem  1: 

M-|>  consisting  of  n  independent  observations  (x^ ^  *x2i  ^ » 

1  *  1,...»n,  on  both  x^  and  x2« 

W2»  consisting  of  n*  Independent  observations  x|j, 
j  *  l,...,n*,  on  x^  alone;  the  x|^  ■  y..  are  not  observed. 

This  Is  the  well  known  missing  observations  problem  (see  Draper  and  Guttman,  1977, 
and  prior  references  listed  therein).  In  the  notation  of  Section  1,  we  write 


♦  »  (a, Mii)' 


(2.4) 


for  the  vector  of  parameters  of  Interest,  and 


5  -  (vV 


(2.5) 


for  the  vector  of  nuisance  parameters.  To  obtain  a  prior  for  (<j»,0),  we  transform 
the  usual  non- informative  prior 


2  -3/2 

P(U| *^2*^] ] *^22'^12^  *  ^C11C22"CI2^ 
using  (2.1),  remembering  to  insert  the  Jacobian.  This  provides 

p[(ct»6,d^ ),  (^^,^22)]  *  dn/*d22/* 


(2.6) 


(2.7) 


from  which  it  is  clear  that  condition  4  of  Theorem  1  is  satisfied. 

Wj  1  ~  (x^  1  i  *  1,2,...,n), 

W-J2  3  (x2i»  i  *  1,2,... ,n). 


If  we  define 


(2.8) 


and  employ  (2.2)  to  (2.5),  condition  3  of  Theorem  1  is  satisfied,  and  condition 
2  is  satisfied  because  of  (2.2b).  It  follows  that  Theorem  1  applies  and  that 
the  posterior  of  $  given  (W^,^)  depends  only  on  Wj.  In  fact  it  can  be  verified 
directly  that 

P($|WrW2)  «  dft’1}/2  exp[-*sd1  ,iS+(^1-J1  )X ’ X (^ -^ ) >3 ,  (2.9) 


where 


S  »  Z  (yj-a-Bx,)2,  4,  *  (a,B)' , 

i*l  1  1 

(2.10) 

X  =  (l.Xj), 


-4- 


1  Is  an  n  x  1  vector  of  ones,  Xj  »  (xil*xl2***‘*xin)'*  and 

r  (X'X)  X' (x^i »x22>. • • »x2n^ ' ’  (2*11) 

We  note  that  (2.9)  is  what  we  would  have  obtained  if  we  had  calculated 
P(<HWj)  on  the  basis  of  alone,  ignoring 

In  the  missing  observations  problem,  we  are  usually  concerned  with  inference 
on  the  difference  6  *  T^is  can  lnvolve  use  of  the  predictive  distribution 


h(y|WrW2)  =  / 


and  we  see  from  the  above  that  p(^|W.j ,W2)  can  be  replaced  by  p(d> J )  with  the 
same  result.  (Details  are  given  by  Oraper  and  Guttman,  1977). 

This  seems  Intuitively  reasonable  from  (2.2a) and  (2.2b).  The  distribution  of  y 
conditional  on  xf  depends  on  a,B  and  d^,  and  the  distribution  of  x^  depends 
only  on  and  d22  and  provides  no  information  on  <J>  ■  (a.B.d^)'.  Thus, in  making 
Inferences  about  <j>.  x$  can  effectively  be  Ignored. 

Example  2.2.  In  this  example,  we  suppose  that  (x  ,...,xfc)  has 
the  multinomial  distribution 


JLL 


n- x  - . . ,-x. 
l  i 


-,  #  •  •  •  i  v  \  ( !•  “  •  •  •  “7,  ) 

*  J.  K  J.  X  J.  K 


(2.12) 


where  0  S  x.  S  n  ,  0  S  E  x.  <  p*  0  <  .  <  1,  and  E  y.*l. 

1  i  i  j.  1 


-5- 


Let  (1  s  k  *  k-1) 


0  =  t 

l  1 


Y  =  6 

'l  1 


0  ** 

2  2 


^*2  “  ®2 


♦l  *  \+l/(1‘  l  *i> 


\*1  *  V 


(2.13) 


4>k  -  V(1"  £l 

2  1 


*k  =  (1_S  9i) 

K  K2  1 


where  k  2  *  k-k^  . 

We  note  that  (2.13),  viewed  as  a  tranf ormation  from  (Jf  ,  .  .  . ^ ) 

to  (0  ,  •  •  •  ,0.  ,  )»  has  Jacobian  whose  absolute  value 

1  *1  1  k2 

k 

is  (1  -  E10  )  2. 

1 

Now,  as  is  well  known^ we  may  write  (2.12)  as 

mk  (  X1  '  *  '  ‘  '  Xfc)  *  f  ;  )  f  (  Xk  )  (2.14) 

where 

X1  xk  T'"xl"*_xk 

*1  1  l  X1 - Xk'(n  X1  *•  XkLJ-  kl  1  *1  (2.14a) 


f (xk  +1 ' *  *  'xk I x!  "  ’ 'xki) 


(n-xL-. - • xki) : 


V1 


V*2 


xk,  +i: • --xk: <n"Jxi> : 


X(1-<J)  -.  .  .-<D  )^xi 
1  *2  1 


(2.14b) 


I 


It  is  often  the  case  that  given  x_,...,x  ,  the  probability 

/  X  K  ^ 

of  observing  characteristic  k^+j  ,  j*l,...,k2,  is  of  interest  and, 

kl 

of  course,  <p .  m  V.  ./(1-E  )  is  this  conditional  probability. 

3  Kl+3  1  i 

Suppose  that  in  gathering  data  to  estimate  ($^ , . . . )',  that 
the  data  on  (x^,...,xk>  has  two  independent  pieces 


(x*1*,  x*1* 


(1)A  , 


X(2)  >• 

•  •  •  I  / 


(2.16) 


where  W  has  probability  function  given  by  (2.14)  and  W..  has  probability 

•w  1  —  * 

function  given  by  (2.14a),  with  1  S  ^  S  k-1.  Suppose  too  that  a-priori, 


p  ( ^  » •  •  •  *  )  *  c 


(2.17) 


Hence,  consulting  (2.13),  and  recalling  that  $  is  of  interest  and  6 


is  nuisance,  we  find 


1  k, 

P(<|>,0)  «  (l-Z  0. )  : 

1  1 


(2.17a) 


Hence,  condition  4  of  Theorem  1  is  satisfied,  and  letting 


W  -  /x(1)  x(1) 

"l  Vxl  '•**'xk. 


M  \  [1  \  \ 

-k^i . xk  •  ‘SuiSia1-  lt  is  to 


see  that  conditions  1-3  of  Theorem  1  also  hold.  Hence,  from  Theorem 
1,  the  marginal  posterior  of  $,  given  W  ,  W  ,  does  not  depend  on  w  . 
Indeed,  it  is  easy  to  see  directly  that  the  posterior  of  $ ,  given 


W.  and  W  is 

«X  %4 
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3.  A  SUBSIDIARY  RESULT. 


In  this  section,  we  suppose  that  the  data  W  =  (Wj,W2),  with  Wj  independent 
of  W2,  Is  such  that 

1.  The  density  function  of  is  such  that 


fWj a  i^i 


and  the  density  function  of  W2,  is  such  that 


fW2^2l!r$2’^  =  fW2^2^ 


13.1) 


13.2) 


2.  p(^,*2,0)  “  a(e)  b(^,^2) 

We  suppose  Interest  is  in  alone,  so  that  0  and  $2  are  vectors  of  nuisance 
parameters.  We  have 

Theorem  2.  If  conditions  1  and  2  above  hold,  then  the  posterior  of  ,  given 
W  =  (W-| ,W2)  does  not  depend  on  W2,  given  that  W-j  and.  W2  are  independent. 

Proof.  We  have,  since  W^  and  W2  are  independent,  and  that  conditions  1 
and  2  hold,  that 


p(<j>^  |W-j  f  f  a(0)b(<P-j  ,<i>2) fw(w | <f>-j  ,$2»0)d9  d$2 


(3.3a) 


/  b($v*2)fw  a(®,fW  d*2  (3,3b) 

4>2  '  -1  0  -2.  ' 


/  b ( , <^2 ) f (W'j  |  ^  j  >4*2  )d<J>2 


(3.3c) 


where  the  inner  integral  in  (3.3b) ,  a  function  of  W2  only,  has  been  absorbed  in 
the  constant  of  proportionality,  and  the  theorem  is  proved. 

Note  that  the  posterior  of  given  alone,  need  not  exist,  even  though 
the  posterior  of  4^,  given  W1  and  W2  is  independent  of  W2-  For  a  direct  computation 
of  the  posterior  of  given  (only)  Wr  yields  (under  the  assumptions  of  Theorem  2), 


P^IVf])  « 


/  /a(0)b(*r$2)fw  d$2 

4*2  0 


(3.4a; 


/  b($r$2)fw  (w1l$r$2){/ 

4>  -1  0 


(3.4b) 


1 


4 


4 


'4 


but  if  the  prior  density  of  0,  a(0)  is  improper,  then  p($-| |W-| )  does  not  exist. 
For  further  remarks  on  improper  priors,  see  Dawid,  Stone,  and  Zidek  (1973). 

An  example,  is  provided  by  the  following: 

U1  *  {(z^.);  i  =  l»...,n)},  where  the  zi  are  constants, 

(3.5) 

f(x2l- |Zi*,a,B.T2)  *  (2ttt2)’1>/2  exp  -  -^(Xg^-o-Bz^)  , 


and 


Wg  B  (x*j , j~l , . . . ,n*} 


(3.6) 


•10* 


^  $2  "  *2  3  T  :  Is  • 


(3.7) 


Finally,  we  assume  that 


*  a(0)b(^1,(fr2). 


(3.8) 


a(8)  *  a(y,a^)  *  ]/a ^ 


(3.8a) 


where 


b(*r«l>2)  -  b(d,B,x  ) 


(3.8b) 

,  ll/2  (n  -2)/2 

a1/2S0  ,  2,-(Jsno+1)  1 

-r?i — Vv5-  (t)  exp  {"  i  [So+Q] 

2  (r(Js;)2r(-%— )  2t 


Q  *  (a-o0,8-80)A  (a-oo,8-0o)' 


(3.8c) 


with  A  positive  definite.  It  turns  out  that  the  posterior  of  (a,8). 
given  W,  and  W0,  is  connected  to  the  distribution  of  a  bivariate  t,  degrees  of 


freedom  (n+nQ-2),  which  is  independent  of  the  x^'s,  that  is,  of  and,  further, 
that  the  posterior  of  4>j,  given  alone,  does  not  exist  -  these  results  may  be 
seen  by  substituting  in  (3.3c)  and  (3.4b). 
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